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Abstract 

In this paper we derive a new framework for independent component analysis (ICA), called measure- 
transformed ICA (MTICA), that is based on applying a structured transform to the probability distribution 
of the manifest vector, i.e., transformation of the probability measure defined on its observation space. 
By judicious choice of the transform we show that the separation matrix can be uniquely determined 
via diagonalization of some measure-transformed covariance matrices. In MTICA the separation matrix 
is estimated via approximate joint diagonalization of some empirical measure-transformed covariance 
matrices. Unlike kernel based ICA techniques where the transformation is applied repetitively to some 
affine mappings of the manifest vector, in MTICA the transformation is applied to its probability 
distribution only once. This results in performance advantages and reduced implementation complexity. 
The proposed approach is illustrated in extensive simulation examples that show its advantages as 
compared to other existing state-of-the-art methods for ICA. 

Index Terms 

Approximate joint diagonalization, blind source separation, independent component analysis, proba- 
bility measure transform. 

I. Introduction 

Independent component analysis (ICA) is a technique for multivariate data analysis that aims at 
decomposing an observed random vector, also called the manifest vector, into linear combination of 
mutually independent random variables [1], [2]. The manifest vector is assumed to be generated by 
an unknown linear mixture of mutually independent latent variables, called sources, with unknown 
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distributions. The coefficients matrix of the linear mixture is called the mixing matrix and assumed 
to be invertible. Given a sequence of i.i.d. samples from the distribution of the observed vector, ICA 
aims to estimate the inverse of the mixing matrix, called the separation matrix, that is used for recovering 
the sources. Unlike principal component analysis, ICA can deal with a general mixing structure, which 
is not constrained to be orthogonal. The mutual independence assumption is plausible in a wide variety 
of fields, including telecommunications [ ■>], [4] finance [j], [o], and biomedical signal analysis [ ], [ ], 
which makes ICA a natural tool for blind source separation in instantaneous linear mixtures. 

ICA algorithms can be categorized into parametric and semi-parametric classes. Parametric ICA meth- 
ods involve specifying parametric models for the probability distributions of the sources followed by 
optimization of contrast functions that involve both the mixing matrix and the model's nuisance parame- 
ters. Generally, these contrast functions are based on the likelihood function [9] -[12], on non-Gaussianity 
measures such as kurtosis [ ], or on high-order correlations such as fourth-order cross-cumulants [ ], 
[15]. The main drawback of these techniques is that they might fail whenever the modeling assumptions 
are not satisfied. Unlike parametric ICA techniques, semi-parametric ICA methods [16]-[I9] assume 
nothing about the probability distributions of the sources, which make them more robust to varying 
source distributions. 

Another way to classify ICA algorithms is to divide them into data-based and statistically-based 
techniques. Data-based techniques [9], [10], [13], [16]-[18] involve successive linear transformations 
that are applied to the data until some criterion of independence is maximized. These techniques require 
storage of the entire data since it must be re-analyzed at each iteration. Unlike data-based techniques, in 
statistically-based methods [1], [12], [15], [19], [20], the data is condensed into a smaller set of summary 
statistics that are computed only once. These summary statistics are then used to estimate the separation 
matrix. 

In this paper we introduce a new semi-parametric statistically-based ICA framework. The proposed 
framework, called measure-transformed ICA (MTICA), is inspired by a measure transformation approach 
that was recently applied to canonical correlation analysis [21]. MTICA is based on applying a transform 
to the probability distribution of the manifest vector, i.e., transformation of the probability measure 
defined on the observation space. The proposed transform is structured by a non-negative function called 
the MT-function. It preserves statistical independence and maps the probability distribution into a set of 
new probability measures on the observation space. By modifying the MT-function, classes of measure 
transformations can be obtained that have different useful properties. Under the proposed transform we 
define the measure-transformed (MT) covariance and derive its strongly consistent estimate. In MTICA 
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the separation matrix is estimated via approximate joint diagonalization [22]-[34] of some empirical 
measure-transformed covariance matrices. 

The MT-function can be selected from either exponential or Gaussian families of functions parameter- 
ized by scale and translation parameters. When we use the exponential MT-function the corresponding 
measure-transformed covariance matrix of the manifest vector is equal to the Hessian of the log-moment- 
generating-f unction, resulting in the ICA method proposed in [ ], which we call here exponential- 
MTICA. In [ ] the author showed that if at most one of the sources is Gaussian, then the mixing 
matrix can be uniquely identified, up to scaling and permutations of its columns, via non-symmetric 
eigenvalue decomposition that involves two Hessians of the log-moment-generating-function. Based on 
this property, exponential-MTICA estimates the separation matrix via non-orthogonal approximate joint 
diagonalization (NOAJD) [22]-[32] over a set of empirical exponential MT-covariance matrices. These 
matrices are obtained by evaluating the exponential MT-function at different test-points in the parameter 
space. 

Under the Gaussian MT-function a new technique for ICA, called Gaussian-MTICA, is obtained. We 
show that if at most one of the sources is Gaussian, then the unitary mixing matrix associated with 
the whitened manifest vector can be uniquely identified via symmetric eigenvalue decomposition of a 
single Gaussian MT-covariance matrix. Gaussian-MTICA estimates the separation matrix via empirical 
whitening and orthogonal approximate joint diagonalization (OAJD) [33], [34] over a set of empirical 
Gaussian MT-covariance matrices. These matrices are obtained by evaluating the Gaussian MT-function 
at different test-points in the parameter space. 

The MTICA algorithms have the following advantages over existing state-of-the-art ICA methods: 
1) Similarly to semi-parametric ICA techniques, such as kernel-ICA-KGV (KGV) [ ], fast kernel-ICA 
(FKICA) [ ], and RADICAL [ ], the MTICA methods do not rely on restrictive assumptions about 
the distribution of the sources. Therefore, unlike parametric ICA methods such as fast- ICA (FICA) [ . ], 
JADE [ ] and extended Infomax (EIMAX) [10], the MTICA methods are more robust to varying 
source distributions. 2) The MTICA algorithms are comprised of a non-iterative part that involves 
estimation of the MT-covariance matrices followed by an iterative part that involves approximate joint 
diagonalization. The non-iterative part has computational complexity that is linear in the sample size 
while the computational complexity of the iterative part is sample size independent. This results in 
reduced computational complexity in comparison to data-based techniques such as KGV, FKICA, and 
RADICAL whose computational complexity is super-linear in the sample size. 3) In contrast to KGV and 
FKICA, the MTICA techniques do not expand the dimension of the observed vector, nor do they require 
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regularization of the measure-transformed covariance matrices. 4) Unlike KGV and FKICA, that involve 
complex optimization over the Stiefel manifold [35], the MTICA methods are easy to implement and only 
involve simple estimation of some MT-covariance matrices followed by approximate joint diagonalization 
which can be performed with off-the-shelf algorithms [22]-[3- ]. 5) The Gaussian MT-function is bounded 
and has the property that it de-emphasizes samples distant from its location parameter. Consequently, 
unlike cumulant based techniques such as JADE and PICA, the Gaussian-MTICA is highly robust to 
outliers. 6) Unlike ICA techniques that are based on whitening and unitary de-mixing, the exponential- 
MTICA algorithm is more robust to model mismatch scenarios where the whitened observations do not 
admit unitary mixing. 

The proposed MTICA approach is evaluated in extensive simulation examples that illustrate the 
advantages relative to other state-of-the-art ICA techniques, such as PICA, JADE, EIMAX, KGV, PKICA, 
and RADICAL. 

The paper is organized as follows. In Section II, we review the ICA problem. In Section III, the 
MTICA procedure is derived. In Section IV, the exponential-MTICA method and its relation to [ ] are 
discussed. In Section V, the Gaussian-MTICA method is developed. Comparisons between exponential- 
MTICA and Gaussian-MTICA are given in Section VI. In Section VII the computational complexity of 
the MTICA algorithms is determined and compared to those of other ICA techniques. In Section VIII, the 
performance of the proposed approach is compared to other ICA techniques via simulation experiments. 
In Section IX, the main points of this contribution are summarized. The propositions and theorems stated 
throughout the paper are proved in the Appendix. 



A. Preliminaries 

Let X = [Xi, . . . ,Xp\^ denote a random vector, whose observation space is given by A' C RP. We 
define the measure space {X ,Sx, P:x.), where Sx is a cj-algebra over X, and is the joint probability 
measure on Sx- Let Af^ denote the observation space of X^. The marginal probability measure of Px on 
Sx^ is denoted by Px^, were Sx,, is the cr-algebra over X^. Let g (•) denote an integrable scalar function 
on X. The expectation of g (X) under P^ is defined as 



II. Independent component analysis: Review 




(1) 



X 



where x G A'. The components of X will be said to be mutually independent under Px if 



E [g (Xj) h (Xfc) ; Px] = E [g (Xj) ; P^J E [h (X^) ; P^J Vj / k 



(2) 
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for all integrable scalar functions g{-), h{-) on X. The components of X will be said to be mutually 
uncorrelated under Px if 

E[X,-Xfc;Px] =E[X,-;Px,]E[Xfc;PxJ Vj / fc. (3) 

B. Independent component analysis 

The instantaneous noiseless ICA model takes the following form: 

X = AS, (4) 

where X G W, p > 2, is an observed random vector, A G MP^'^ is an invertible unknown matrix, called 
the mixing matrix, and S € is a latent random vector comprised of mutually independent variables 
having finite second-order moments and unknown distributions. The components of S are also called 
sources. Under the model (4) it has been shown [ ], [2], [36], [37] that the mixing matrix A can be 
uniquely identified, up to permutation and scaling of its columns, if and only if at most one of the sources 
is Gaussian. Given a sequence of i.i.d. samples from Py^, ICA aims to estimate the separation matrix 
B = A^^ and thus, recover the sources using the relation S = BX. 

Many state-of-the-art ICA algorithms, such as JADE, PICA, EIMAX, KGV, FKICA and RADICAL, 
referenced in Section I, apply whitening to the observed vector X. The whitened observation vector is 
given by 

Z = WX = us, (5) 

where W is the whitening matrix and U = WA. Assuming, without loss of generality, that the 
components of S have unit variances, one can easily verify that the matrix U is unitary leading to 
a unitary mixing model. Let V = U^, where (•)^ denotes the transpose operator. ICA algorithms that 
use whitening implement an estimate of V using constraint optimization over the Stiefel manifold of 
unitary matrices [35]. The empirical separation matrix is then obtained using the relation B = VW. 

III. Measure transformed ICA 

In this section the MTICA procedure is derived. First, a transform which maps a probability measure 
Px into a set of probability measures on Sx is defined that has the property that it preserves 

mutual independence between the components of X under Px. Second, we define the measure-transformed 
covariance and derive its strongly consistent estimate. Finally, based on the mixing models (4), (5) we 
derive the MTICA procedure that applies approximate joint diagonalization to a set of empirical measure- 
transformed covariance matrices. 
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A. Probability measure transform 

Definition 1. Given a non-negative function u -.W ^ M-i- satisfying 



u 

k=l 

and 



(K) = llukiXk), Uk:M^M+, k = l,...,p, (6) 



0<E[^i(X);Px] <oo, (7) 
a transform on the probability measure is defined via the following relation 

Qir^ (A) ^ T„ [Px] {A) = Jifu (x) dPx (x) , 



A 



where A G Sx, x = [xi, . . . , Xp]^ £ X, and 



(8) 



E [u (X) ; PxJ 

The function u (•), associated with the transform T„ [•], is called the MT-function. 
In the following Proposition, some properties of the measure transform (8) are given. 

(u) 

Proposition 1. Let Qx be defined by relation (8). Then 

1) Qx^ ^ probability measure on Sx- 

(u) 

2) Qx absolutely continuous w.r.t. Px, with Radon-Nikodym derivative [ ] given by 

3) Assume that the MT-function n (•) is strictly positive, then Px is absolutely continuous w.r.t. Q^'' 



with a strictly positive Radon-Nikodym derivative given by 

dPx (x) 1 u-^ (x) 



dQx ^ (x) ^« (^) E 



^z-i(X);Q 



(11) 



(u) 

4) If Xi, . . . , Xp are mutually independent under Px, then they are mutually independent under . 
[A proof is given in Appendix A ] 

By modifying the MT-function u (•), such that the conditions (6), (7) are satisfied, an arbitrarily large 
set of probability measures on Sx can be obtained. 
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B. The measure-transformed covariance 

According to (1) and (10) the measure-transformed covariance of X under Qx is given by 

Jt'^ = E [XX^^„ (X) ; Px] - E [X(^„ (X) ; Px] E [X^(^„ (X) ; Px] . (12) 

(u) 

Equation (12) implies that Sx is a weighted covariance matrix of X under Px, with weighting function 
ipu (•)■ Hence, Xlx"^ can be estimated using only samples from the distribution Px- By modifying the 
MT-function u (•), such that the conditions (6), (7) are satisfied, the MT-covariance matrix under Q^'' 
is modified. In particular, by choosing u (x) = 1, we have Qx ^ = ^x, and the standard covariance 
matrix is obtained. In the following Proposition a strongly consistent estimate of the measure-transformed 
covariance is given that is based on i.i.d. samples from the probability distribution Px. 

Proposition 2. Let X„, n = 1, . . . ,N denote a sequence of i.i.d. samples from the distribution Px, and 
define the empirical covariance estimate 

1 ^ AT 

n=l 



where 



and 



1 ^ 

A^"^ = ^I]X„(^4X„), (14) 



n=l 



ipu (X„) = . (15) 

n=l 



Assume 



E [n^ (X) ; Px] < oo and E [xl; Px] < oo VA: = 1, . . . ,p. (16) 

r/ien Xl^^ — )• S^-* almost surely as N ^ oo. [The proof is similar to the proof of Proposition 3 in [-i] 
and therefore is omitted] 

Note that for u (X) = 1 the estimator Xl^ reduces to the standard unbiased estimator of the covariance 
matrix Xlx- 



C. The MTICA procedure 

In MTICA we choose a sequence of MT-f unctions Um (•), m = l,...,M that satisfies at least one of 
the following conditions: 
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1) Under the ICA model (4) the separation matrix B is the unique matrix (up to permutation and 

(u ) 

scaUng of its rows) that jointly diagonalizes the MT-covariance matrices " , m = 1, . . . , M. 

2) Under the unitary mixing model (5) the matrix V = is the unique matrix (up to permutation 

(u ) 

and sign of its rows) that jointly diagonalizes of the MT-covariance matrices S2 " , m = 1, . . . , M. 
When the first condition is satisfied, the separation matrix B is estimated via NOAJD of the empirical 
MT-covariances Xl^ " , m = 1, . . . , M. The NOAJD [22]- [32] seeks for a non-singular matrix B G IRP^p, 
such that BEx " B^, m = 1, . . . , M are "as diagonal as possible" in the sense that a deviation measure 
from diagonality is minimized. The MTICA procedure in this case is summarized in Algorithm 1. 

Algorithm 1 MTICA with no whitening 

Input: A sequence of data samples X„, n = 1, . . . , N. 

1: Choose a sequence of MT-f unctions Um{-), m = 1,...,M, such that B is the unique joint 

(u ) 

diagonalization marix of S^"\ m= 1,...,M. 
2: Using (13)-(15) derive the empirical MT-covariances Sx" , m = 1, . . . ,M. 

- (u 1 

3: Find the NOAJD matrix B of Sx > m = 1, . . . , M. 
Output: The empirical separation matrix B. 



Alternatively, when the second condition is satisfied the observations are whitened, and the estimate of 
V is obtained via OAJD of the empirical MT-covariance matrices m = 1, . . . , M, where Z = WX 

and W is the empirical whitening matrix. The OAJD [33], [34] seeks a unitary matrix V G M^'^p, such 
that VS^'"'^V'^, m = 1,...,M are "as diagonal as possible" by, once again, minimizing a deviation 
measure from diagonality. The empirical separation matrix is obtained by taking B = VW. The MTICA 
procedure in this case is summarized in Algorithm 2. 

By modifying the MT-f unctions such that the stated conditions are satisfied a family of measure- 
transformed independent component analyses can be obtained. Particular choices of MT- functions leading 
to the exponential and Gaussian MTICA algorithms ai^e discussed in the succeeding sections. 

IV. EXPONENTIAL-MTICA 

In this section we parameterize the MT- function u (•;*), with scaling parameter t under the 

exponential family of functions. Under this choice of MT-function the MT-covariance is given by the 
Hessian of the log-moment generating function resulting in the ICA algorithm proposed in [ ]. 



February 5, 2013 



DRAFT 



SUBMITTED TO THE IEEE TRANSACTIONS ON SIGNAL PROCESSING 



9 



Algorithm 2 MTICA with whitening 

Input: A sequence of data samples X„, n = 1, . . . , N. 

1: Choose a sequence of MT-functions Um{-), rn = 1,...,M, such that V is the unique joint 

(u ) 

diagonaUzation matrix of " , m = 1, . . . , M. 
2: Estimate the whitening matrix W. 
3: Generate the sequence Z„ = WX„, n = 1, . . . , A^. 

4: Using (13)-(15) derive the empirical MT-covariances 'S^ "\ m = 1, . . . ,M. 

- (u ) 

5: Find the OAJD matrix V of m = 1, . . . , M. 

Output: Obtain an estimate of B by taking B = VW. 



A. The exponential MT-covariance matrix 

Let ue (•; ■) be defined as the parameterized function 

Ue (x; t) = exp (t^x) , (17) 

where t € M. Using (9), (12) and (17) one can easily verify that the covariance matrix of X under Qx"^^ 



takes the form 



where 

Mx(t)=E[exp(t^X);Px] (19) 

is the moment generating function of X, and it is assumed that Mx (t) is finite in some open region in 
containing the origin. Note that the covariance matrix in (18) involves higher-order statistics of X. 
Additionally, observe that Xl^^^ (t) reduces to the standard cross-covariance matrix Sx for t = 0. 

The following theorem states a necessary and sufficient condition for Gaussianity of a random variable 
X based on its exponential MT-variance. 

Theorem 1. A random variable X with corresponding probability measure Px is Gaussian iff the first- 
order derivative of the exponential MT-variance satisfies 

da^""^^ it) 

=0 \/teito-e,to + e), (20) 
where e is some positive constant and to arbitrary point in R. [A proof is given in Appendix B] 
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Hence, if a random variable X is non-Gaussian then its exponential MT-variance (Tx (^) 
constant over any open interval. This property is used in the following subsection to estabUsh identifiabihty 
of the mixing matrix A. 

B. Identifiability of the mixing matrix A under two exponential MT-covariance matrices 

Using (4), (9), (12) and (17) it can be shown that for any choice of the scaling parameter t the 
exponential MT-covariance of the observation vector X has the following structure: 



where Sg ^ (■) the covariance matrix of the latent vector S under the transformed probability measure 
Q^s^\ Since the components of S are mutually independent under Ps, by Property 4 in Proposition 1, 
they are mutually independent under Q^s^\ and therefore, 5]g'*^^ (•) must be diagonal. Thus, the following 
property stems directly from (21): 

Proposition 3. Let ti and ±2 denote two arbitrary points in W. Assume that 

1) The matrices Xlg"^^ (A-^ti), and Xlg"^^ (A^t2) have finite diagonal entries, 

2) The diagonal entries o/ Slg"^^ (A^t2) are non-zero, and 



3) The matrix Ar"^ (A^ti, A^ta) = (A^ti) Sr'' (A^*2) has distinct diagonal entries. 



i.e., no pair of diagonal entries have the same value. 
Then, A can be uniquely identified, up to scaling and permutation of its columns, by solving the following 
non-symmetric eigenvalue decomposition problem: 



[A proof is given in [19]]. 

Based on the variation property of the exponential MT-variance, shown in Theorem 1, the following 
Theorem shows that Assumption 3 in Proposition 3 is satisfied almost everywhere if at most one of the 
components of S is Gaussian. 



Theorem 2. Let Ve = \ {tiM) (^W xW : K^^ (A^ti, A^ta) does not have distinct diagonal entries 



If at most one of the sources is Gaussian, then the set "De has zero Lebesgue measure. [A proof is given 
in Appendix C] 




(21) 




(22) 
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C. The exponential-MTICA algorithm 

According to (21), Proposition 3, and Theorem 2, the separation matrix B = A^^ is the unique joint 
diagonalization matrix of any two exponential MT-co variance matrices S^^'' (*i) ^^d Xl^^'* {±2) that 
satisfy the stated assumptions. Thus, the exponential-MTICA algorithm is obtained by replacing the MT- 
f unctions Um (•), m = 1, . . . , M in Algorithm 1 with a sequence of exponential MT-f unctions Ue (•; im), 
m = 1, . . . , M. A procedure for choosing the test-points tm G M^, m = 1, . . . , M is given in Appendix 
Gl. Clearly, only two test-points are needed for obtaining a viable estimate of B. However, in order to 
increase statistical stability and reduce the effect of ill-conditioned empirical MT-covariance matrices it 
is better to use a sequence of more than two test-points. 

V. Gaussian-MTICA 

In this section we parameterize the MT-function u (•; t, r), with translation parameter t and width 
parameter r € M!^ using a Gaussian family of functions. Under the unitary mixing model (5) we show that 
if at most one of the sources is Gaussian, the mixing matrix U can be uniquely identified via eigenvalue 
decomposition of a single Gaussian MT-covariance matrix. Based on this result the Gaussian-MTICA 
algorithm is obtained that applies OAJD to a sequence of empirical Gaussian MT-covariance matrices. 

A. The Gaussian MT-covariance 

We define the Gaussian MT-function uq {■]■,■) as 

Ug (x; t, r) = exp I ^^2^ 1 ' (^3) 

where t G W, r G M^, and ||-||2 denotes the /2-norm. Since Uq (•; •, •) is strictly positive and bounded, 
one can easily verify that the condition (7) is always satisfied. Relations (9) and (12) imply that the 
MT-function (23) produces a weighted covariance matrix, Xl^"^^ for which the observations are 

weighted in inverse proportion to the distance ||x — tUg. This results in a kind of local covariance analysis 
of X in the vicinity of the test-point t. 

The following theorem states a necessary and sufficient condition for Gaussianity of a random variable 
X based on its Gaussian MT-variance. 

Theorem 3. A random variable X with corresponding probability measure Px is Gaussian iff the first- 
order partial derivative of the Gaussian MT-variance satisfies 

=0 Vte (to-e,to + e), (24) 
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where e is some positive constant and to is some arbitrary point in M [A proof is given in Appendix D]. 

Hence, similarly to the exponential MT-variance, if a random variable X is non-Gaussian then for 
any choice of the width parameter r G the Gaussian MT-variance cr^x'^^ ''") constant w.r.t. 

t over any open interval. This property is used in the following subsection for proving identifiability of 
the mixing matrix U. 



B. Identifiability of the unitary mixing matrix U under a single Gaussian MT-covariance 

According to (5), (9), (12) and (23) the MT-covariance of the whitened observation vector Z under 
Qz"'"'^ has the following structure: 

E^r^) (t, r) = UE^"^) (U'^t, r) U'^, (25) 

where Sg"*^'' (•, •) is the covariance matrix of S under the transformed probability measure Q^"°\ Since 
the components of S are mutually independent under Ps, then by Property 4 in Proposition 1 they are 
mutually independent under Qs"^^ ^nd thus, Sg"'^-' (•,•) must be diagonal. Therefore, assuming that 
Xls"*^^ (U^t,r) has distinct finite diagonal entries, the unitary matrix U can be uniquely identified (up 
to permutation and sign of its columns) via eigenvalue decomposition of the Gaussian MT-covariance 

Based on the variation property of the Gaussian MT-variance, shown in Theorem 3, the following 
theorem states that if at most one of the components of S is Gaussian, then Xlg"'"'^ (U^t, r) has distinct 
diagonal entries for almost every t € W. 

Theorem 4. Let Vq = |t G : S^"'^^ (U-^t,r) does not have distinct diagonal entriesj. If at most 
one of the sources is Gaussian, then the set Vq has zero Lebesgue measure. [A proof is given in Appendix 
E] 

C. The Gaussian-MTICA algorithm 

According to (25) and Theorem 4, if at most one of the sources is Gaussian, then for almost ev- 
ery t ^ W the matrix V = U-^ is the unique diagonalizing matrix of the Gaussian MT-covariance 
{t, r). Thus, the Gaussian-MTICA algorithm is implemented by replacing the MT-functions Um {■), 
m = 1, . . . , M in Algorithm 2 with Gaussian MT-functions Uq {■;tm,T), m = 1, . . . , M, where the width 
parameter r G is fixed. A procedure for choosing the test-points tm G K^, m = 1, . . . , M is given in 
Appendix G2. Clearly, only one test-point is needed for estimating V. However, estimation of V based 
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on diagonalization of a single empirical Gaussian MT-covariance has the following drawbacks: 1) For 
some choice of the translation parameter t the spectrum of the corresponding Gaussian MT-covariance 
may be degenerate, i.e., the eigenvalues may not be well separated. 2) A single Gaussian MT-covariance 
may only capture part of the statistical information about Z necessary to separate the sources effectively. 
In order to alleviate these drawbacks it is better to use more than a single test-point. 

VI. Comparisons between exponential and Gaussian MTICA 

Unlike Gaussian-MTICA that requires whitening, which under the model (4) leads to unitary mix- 
ing, exponential-MTICA does not require whitening. Therefore, as illustrated in Subsection VIII-D, 
exponential-MTICA is more robust to model mismatch scenarios under which the whitened observations 
are poorly modeled by unitary mixing. Moreover, in Gaussian-MTICA, in addition to the location 
parameter t, which shares the same dimensionality of the scaling parameter of the exponential MT- 
function, one has to set a width parameter r. 

On the other hand, unlike the exponential MT-function, the Gaussian MT-function is bounded over the 
observation space and isotropically de-emphasizes samples distant from the Gaussian location parameter. 
This property leads to several advantages of Gaussian-MTICA over exponential-MTICA including the 
following: 1) As illustrated in Subsections VIII-A and VIII-C, Gaussian-MTICA is more robust to 
distributions with unbounded support and outliers than exponential-MTICA. 2) Unlike the exponential 
MT-covariance, which does not exist for distributions with infinite moment generating function, one can 
easily verify that if a random vector has finite fourth-order moments then its corresponding Gaussian MT- 
covariance must take finite values. Additionally, the Gaussian MT-function has the physical property that 
it localizes linear dependence over the observation space. Hence, Gaussian-MTICA operates by jointly 
minimizing the local lineai^ dependencies in the vicinities of the selected set of test-points. 

VII. Computational complexity 

In this section we evaluate the computational complexity of the exponential and Gaussian MTICA 
algorithms and compare to some other ICA methods. The exponential-MTICA algorithm is comprised 
of two major steps: 1) estimation of M exponential MT-covariance matrices with computational com- 
plexity of O (M • N ■ p^) flops, and 2) NOAJD with computational complexity of O (L • M • p^) flops, 
where L is the number of iterations used in the NOAJD algorithm. Therefore, exponential-MTICA has 
computational complexity of O (M ■ N ■ + L ■ M ■ p^) flops. 
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The Gaussian-MTICA algorithm is comprised of 1) a whitening stage with computational complexity of 
O • p^) flops, 2) estimation of M Gaussian MT-covariance matrices with computational complexity 
of O (M • N • p^) flops, and 3) OAJD with computational complexity of O ■ M ■ p^) flops. Thus, 
Gaussian-MTICA has computational complexity of O (M ■ N ■ + L ■ M ■ p^^ flops. 

Table I compares the computational complexity of exponential-MTICA and Gaussian-MTICA to the 
computational complexity of other ICA techniques, such as JADE, EIMAX, PICA, KGV, FKICA, and 
RADICAL. One can notice that similarly to JADE, PICA, and EIMAX the computational complexities of 
exponential-MTICA and Gaussian-MTICA are linear in the sample size N, which make them favorable 
for large data sets. Moreover, one sees that unlike data-based techniques such as EIMAX, PICA, KGV, 
PKICA and RADICAL, the iterative part of exponential-MTICA and Gaussian-MTICA has computational 
complexity that is not affected by the sample size. 

TABLE I 

Computational complexity of EMTICA, GMTICA, JADE, EIMAX, FICA, KGV, FKICA and RADICAL. The 

SAMPLES SIZE, DIMENSION, NUMBER OF ITERATIONS, AND NUMBER OF MATRICES TO BE APPROXIMATELY DIAGONALIZED 
ARE DENOTED BY A^, p, L, AND M, RESPECTIVELY. THE RANK OF AN iV X A'' GRAM MATRIX AFTER INCOMPLETE 
CHOLESKY DECOMPOSITION IN THE KGV AND FKICA ALGORITHMS IS DENOTED BY D{N). THE NUMBER OF JACOBI 
ANGLES, AND DATA AUGMENTATIONS IN RADICAL ARE DENOTED BY K AND R, RESPECTIVELY. HERE EMTICA AND 
GMTICA REFER TO EXPONENTIAL-MTICA AND GAUSSIAN-MTICA, RESPECTIVELY. 



Algorithm 


Computational complexity 


EMTICA 


O {M ■ N -p^ + L - M -p^). 


GMTICA 


O {M ■ N -p^ + L - M -p^). 


JADE 


0{M ■ N -p^ + L- M -p^). 


EIMAX 


0{L-N-p^). 


FICA 


0{L-N -p). 


KGV 


0{L- {N ■ D^{N) -p^ + D^{N) -p^)). 


FKICA 


0{L- N ■ D^{N)-p^). 


RADICAL 


0{L- {K ■ N ■ R-log{N ■ R)-p^)). 



VIII. NUMERICAL EXAMPLES 

In this Section, the performances of exponential-MTICA and Gaussian-MTICA are compared to the 
JADE, EIMAX, PICA, KGV, PKICA, and RADICAL algorithms using their publicly available MATLAB 
code. The JADE, PICA, EIMAX, and RADICAL algorithms were used with their default settings. In 
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KGV and FKICA the Gaussian kernel width parameter was set to cr = 1/2 for p = 2 and o" = 1 for 
p > 2. In FKICA, the maximum number of iterations and convergence threshold were set to 500, and 
le-10, respectively. All compared algorithms were initialized by the identity matrix. In subsection VIII-E 
the performance of KGV and FKICA were also evaluated after initialized by the JADE algorithm. 

The test-points ti, . . . , t^/ in the exponential and Gaussian MTICA algorithms were selected randomly 
according to the procedures in Appendices Gl and G2, respectively. For p = 2 we used M = 300 test- 
points while for p > 2 M = 1000 test-points were used. The width parameter of the Gaussian MT-function 
in the Gaussian-MTICA algorithm was set to r = 1/2 for p = 2 and r = 1 for p > 2. The NOAJD in 
exponential-MTICA was carried out using Pham's algorithm [ ], while the OAJD in Gaussian-MTICA 
was performed using the FG algorithm [33]. In both Pham's and FG algorithms the initial diagonalizing 
matrix, the maximum number of iterations and convergence threshold were set to the identity matrix, 500 
and le-10, respectively. In all figure legends below, the exponential and Gaussian MTICA algorithms are 
abbreviated by EMTICA and GMTICA, respectively. 

We used the Amari error [ ] as a performance measure that compares the true separation matrix B 
with its estimate B. The Amari error between two matrices G G MP^^ and H G M^^^ is defined as: 

(G U) = ' V (^?I=1^ 1 V r^L^ _ i] (26) 

2p(p-l)-^ l^maxjl^'ijl J 2p{p-l) j^^\maxi\^ij\ J' 

where = [GH~^]^^.. Notice that the Amari error is invariant to permutation and scaling of the 
columns of G and H, and take values between and 1. Also notice that cIa (G, H) = if and only if 
G and H are equal up to scaling and permutation of their columns. In addition to the Amari error, some 
of the trials examined the run times of the compared algorithms. 

The simulations were carried out using data obtained from the univariate source distributions in Table 
II. The sources were translated and scaled to have zero mean and unit variance. In order to avoid ill- 
conditioned mixing, the generated sources were mixed using random matrices with condition number 
between one and two. 

A. Sensitivity to source distribution 

In this experiment we study two-component ICA problems with N = 1000 samples. We illustrate two 
types of ICA applications. In the first application, the source distributions are identical. For each of the 12 
source distributions in Table II, we conducted 1000 Monte-Carlo simulations. For each distribution type, 
box plots of the Amari errors obtained by each algorithm are depicted in Fig. 1 . One sees that Gaussian- 
MTICA is robust to source distribution with performance similar to the KGV and RADICAL algorithms. 
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TABLE n 

Probability distributions used in the simulation examples. 



Distribution 


Parameters 


Uniform 


Support [0, 1]. 


Arc sine 


Support [0, 1]. 


Beta 


Shape parameters a = 1 and /3 = 5. 


Logit-normal 


Location parameter n — 1 and scale parameter a — 1. 


Laplace 


Location parameter n — 1 and scale parameter a — 1. 


Exponential 


Rate parameter A = 1. 


Rayleigli 


Scale parameter a — 1. 


Weibull 


Shape parameter a — 5 and scale parameter a = 1. 


Gamma 


Shape parameter a — 1 and scale parameter a = 1. 


Non-central chi-squared 


Degrees of freedom n — 4, non-centrality parameter A = 2. 


Central chi-squared 


Degrees of freedom n — 4. 


Rice 


Shape parameter a — 1/2. 



One can also observe that exponential-MTICA is more sensitive to distributions with unbounded support, 
such as Laplace, and exponential, than Gaussian-MTICA. 

In the second application, we chose two sources uniformly at random among the 12 possibilities. A 
total of 1000 Monte-Carlo simulations were performed. The box plots of the Amari errors obtained by 
each algorithm are depicted in Fig. 2. Notice that similarly to the KGV, FKICA, and RADICAL, the 
exponential-MTICA and Gaussian-MTICA performs better than JADE, PICA, and EIMAX algorithms. 
The Gaussian-MTICA performs better than exponential-MTICA due to sensitivity of the latter to distri- 
butions with unbounded support. 

We note that although the Gaussian-MTICA, KGV and RADICAL algorithms perform similarly well, 
the Gaussian-MTICA has reduced computational complexity as indicated by Table I and the run time 
analysis in Fig. 4. 

B. Sensitivity to sample size 

In this experiment we illustrate the sensitivity of the compared algorithms to sample size. For each 
sample size ranging from = 100 to = 10000 we performed 1000 Monte-Carlo simulations using p = 
2 sources. The source distributions were chosen uniformly at random from the 12 possible distributions 
in Table II. The averaged Amari errors obtained by each algorithm are depicted in Fig. 3. Observe 
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Fig. 1. Sensitivity to source distribution. Box plots of Amari errors obtained by the compared algorithms for two-component 
ICA with identical source distributions. Notice that Gaussian-MTICA is robust to source distribution with performance similar 
to the KGV and RADICAL algorithms. Although the Gaussian-MTICA, KGV and RADICAL algorithms perform similarly 
well, the Gaussian-MTICA has reduced computational complexity as indicated by Table I. Also notice that Gaussian-MTICA 
performs better than exponential-MTICA for distributions with unbounded support. 
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Fig. 2. Sensitivity to source distribution. Box plots of Amari errors obtained by the compared algorithms for two-component 
ICA with randomly chosen distributions. In similar to the KGV, FKICA, and RADICAL, the exponential-MTICA and Gaussian- 
MTICA perform better than JADE, FICA, and EIMAX algorithms. Although the Gaussian-MTICA, KGV and RADICAL 
algorithms perform similarly well, the Gaussian-MTICA has reduced computational complexity as indicated by Table I. 

that for all examined sample sizes KGV, FKICA, RADICAL, exponential-MTICA and Gaussian-MTICA 
outperform the JADE, FICA and EIMAX algorithms. This result stems from the sensitivity of JADE, FICA 
and EIMAX to varying source distributions as shown in Subsection VIII- A. We note that the Gaussian- 
MTICA, KGV, FKICA and RADICAL perform better than exponential-MTICA due to sensitivity of the 
latter to distributions with unbounded support. The averaged run time of each algorithm is depicted in 
Fig. 4. Notice that for large sample size the run times of exponential-MTICA and Gaussian-MTICA are 
significantly lower than those obtained by KGV, RADICAL, FKICA and EIMAX. This may result from 
lower computational complexity, as indicated by Table I, and more rapid convergence. 

C. Robustness to outliers 

In this experiment we demonstrate the robustness of the compared algorithms to outliers. We simulated 
outliers by randomly choosing up to 25 data points to corrupt out of total 1000 samples. This was carried 
out by adding the value +5 or —5, chosen with probability 1/2, to a single component in each of 
the selected data points. We performed 1000 Monte-Cailo simulations using source distributions chosen 
uniformly at random from the 12 possible distributions in Table II. The averaged Amari errors produced 
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Fig. 3. Sensitivity to sample size. Averaged Amari errors for two-component ICA problems with randomly selected source 
distributions and varying sample size ranging from A'^ = 100 to TV = 10000. Notice that for all tested sample sizes the KGV, 
FKICA, RADICAL, exponential-MTICA and Gaussian-MTICA outperform the JADE, PICA and EIMAX algorithms. Although 
the Gaussian-MTICA, KGV and RADICAL algorithms perform similarly well, the Gaussian-MTICA has reduced computational 
complexity as indicated by Table I and Fig. 4. 



by each algorithm are depicted in Fig. 5. One can observe that the proposed Gaussian-MTICA method is 
least sensitive to outliers. This is due to the boundedness of the Gaussian MT-function allowing it to de- 
emphasize samples that are distant from its location parameter. In comparison to Gaussian-MTICA, the 
exponential-MTICA is more sensitive to outliers due to sensitivity of the empirical moment generating 
function to outliers. However, once can notice that in comparison to JADE, FICA and EIMAX, the 
exponential-MTICA in more resilient to outliers. 

D. Sensitivity to model mismatch 

Here we demonstrate relative insensitivity to model mismatch of the exponential-MTICA algorithm. 
To generate model mismatch we used the following noisy linear mixing model: 

X = AS + AE, (27) 
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Fig. 4. Sensitivity to sample size. Averaged run times for two-component ICA problems with randomly selected source 
distributions and varying sample size ranging from N — 100 to TV = 10000. One sees that for large sample size the run 
times of exponential-MTICA and Gaussian-MTICA are significantly lower than those obtained by KGV, RADICAL, FKICA 
and EIMAX. 



where E is an additive noise vector with statistically independent components having zero mean and unit 
variance, and A > is a scaling parameter that controls the signal-to-noise-ratio (SNR) according to 

tr [AAT] 

SNR= ■ (28) 

p ■ 

For each value of SNR ranging from -5 [dB] to 10 [dB] we performed 1000 Monte-Carlo simulations 
using p = 2 sources and N = 250 samples. In order to filter out the sensitivity of exponential-MTICA 
to probability distributions with unbounded support, the source distributions were chosen uniformly at 
random from the first four distributions in Table II, and the components of noise vector E were uniformly 
distributed. The averaged Amari errors obtained by each algorithm are depicted in Fig. 6. Observe that 
for low SNRs (SNR < [dB]) exponential-MTICA, which does not require whitening, outperforms all 
other compared algorithms that are based on whitening and unitary de-mixing. This is due to the fact that 
for low SNRs the whitened observations largely deviates from unitary mixing. On the other hand, for 
high SNRs one can notice that Gaussian-MTICA, KGV and RADICAL attain better performance than 
exponential-MTICA. This may arise from the fact that for high SNRs the whitened observations admit 
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Fig. 5. Robustness to outliers. The averaged Amari errors obtained by the compared algorithms versus number of outliers for 
two-component ICA with randomly chosen source distributions. One sees that Gaussian-MTICA is least sensitive to outliers. 



nearly unitary mixing, and therefore, obtaining the empirical mixing matrix via whitening and unitary de- 
mixing, which involves a more highly constrained optimization, should result in more accurate estimation. 



E. Sensitivity to dimension 

In this example we studied the sensitivity of the compared algorithms to an increasing number of 
sources ranging from p = 3 to p = 15, with = 1000 samples. We performed 1000 Monte-Carlo 
simulations using source distributions chosen uniformly at random from the 12 possible distributions in 
Table II. The averaged Amari errors are depicted in Fig. 7. Here, exponential-MTICA, Gaussian-MTICA 
and RADICAL outperform all other compared algorithms when there is a high number of sources. The 
KGV and FKICA algorithms perform better than the JADE, EIMAX and FICA only after initialized by 
the JADE algorithm. The averaged run times are depicted in Fig. 8. Observe that Gaussian-MTICA and 
exponential-MTICA perform faster than RADICAL, KGV and FKICA, when initialized by the identity 
matrix. The run times of the KGV and FKICA are improved after initialization by JADE. 
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Fig. 6. Sensitivity to model mismatch. The averaged Amari errors obtained by the compared algorithms, under the noisy linear 
mixing model X — AS + AE, versus SNR. Since for low SNRs the whitened observations largely deviate from unitary mixing, 
exponential-MTICA outperforms all other algorithms that are based on whitening and unitary de-mixing. For high SNRs the 
whitened observations admit nearly unitary mixing, and thus, Gaussian-MTICA, KGV and RADICAL attain better performance 
than exponential-MTICA. 

IX. Conclusion 

In this paper, a new framework for ICA was proposed that is based on applying a structured transform 
to the probabiUty distribution of the data. In MTICA the separation matrix is estimated via approximate 
joint diagonaUzation of some empirical measure-transformed covariance matrices that are obtained by 
evaluating the MT-function at different test-points in the parameter space. By specifying the MT-function 
in the exponential family the ICA technique proposed in [ ], called here exponential-MTICA, was 
obtained. Specification of the MT-function in the Gaussian family resulted in a new ICA algorithm called 
Gaussian-MTICA. The proposed MTICA approach was tested in extensive simulation examples that 
illustrated the advantages of exponential-MTICA and Gaussian-MTICA over state-of-the-art algorithms 
for ICA. It is likely that there exist other classes of MT-functions that may result in other ICA algorithms 
using the proposed framework. 
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Fig. 7. Sensitivity to dimension. Tlie averaged Amari errors obtained by the compared algorithms versus the number of sources 
p. The dashed magenta and green curves plot the performance of the KGV and FKICA algorithm after initialized by the JADE 
algorithm. Observe that exponential-MTICA, Gaussian-MTICA and RADICAL outperform all other compared algorithms for 
high number of sources. The RADICAL algorithm attains best separation performance at the expense of high computational 
complexity as indicated by Table I and Fig. 8. Also observe that KGV and FKICA perform better than EIMAX, JADE and 
FICA only after initialized by the JADE algorithm. 



Appendix 

A. Proof Proposition 1: 

1) Property 1: 

Since (x) is nonnegative, then by Corollary 2.3.6 in [40] is a measure on Sx- Furthermore, 
q(^") [X) = 1 so that Q^^^ is a probability measure on Sx- 

2) Property 2: 

Follows from definitions 4.1.1 and 4.1.3 in [ ,./]. 

3) Property 3: 

According to the definition of (x) in (9), the strict positivity of u (x), and Property 2, we have that 
Qx ^ is absolutely continuous w.r.t. Px with strictly positive Radon-Nikodym derivative '^^^^^^^^ = 
(-Pu (x). Therefore, by Proposition 4.1.2 in [ ] it is implied that Px is absolutely continuous w.r.t. 
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Fig. 8. Sensitivity to dimension. The averaged run times obtained by the compared algorithms versus the number of sources 
p. The dashed magenta and green curves plot the run times of the KGV and FKICA algorithm after initialized by the JADE 
algorithm. One sees that exponential-MTICA and Gaussian-MTICA perform faster than RADICAL, KGV and FKICA for 
identity matrix initialization. The run times of the KGV and FKICA are improved after initialization by JADE. 



Qx ^ with a strictly positive Radon-Nikodym derivative given by = (x). Using (9), (10) 

and the strict positivity of u (x) one can easily verify that 99"^ (x) = r ^^-^ . 

E[m i(X);Qx J 

4) Property 4: 

Let Q^Xk denote the marginal probability measure of Q^^\ defined on Sx^ - Additionally, let Ai, . . . Ap 
denote arbitrary sets in the cr-algebras Sx^, ■ ■ ■ ,Sxp, respectively. Using (6), (8), (9), the assumed 
statistical independence of Xi, . . . ,Xp under Px> and Tonelli's Theorem [38]: 



Q^") (^1 X • • • X Ap) 



J E[u(X);PxJ f-J-J E[uk{Xk);PxJ 



(29) 

which implies that 



(30) 



February 5, 2013 



DRAFT 



SUBMITTED TO THE IEEE TRANSACTIONS ON SIGNAL PROCESSING 25 

By (29) and (30) 

p 

Q^") iAiX---xAp) = ll (Ak) . (31) 

k=l 

Therefore, since ^i, . . . are arbitrary, Xi, . . . , Xp are mutually independent under the transformed 
probability measure Q^^\ 



B. Proof of Theorem J : 

Define M^^^ (t) = E [exp (tX) ; Q^^ 



as the moment generating function of X under the transformed 



probability measure Q^-^^ that is associated with the exponential MT-function 



g (X; to) = exp (toX) . (32) 



Using (9), (12), (17) and (32) one can verify that 



+ , (33, 



If the condition in (20) is satisfied then by (33) and the properties of the moment generating function 

Mif ) it) = exp (^/i^^t + y^h^^ Vt G (-e, e) , (34) 
where /x^'' and fi^^ denote the mean and variance of X under Qx\ respectively. Since the moment 



generating function, reduced to any open interval that contains the origin, uniquely determines the 
distribution [42], [43], then Q^^ is a Gaussian measure. Hence, by Lemma 1 in Appendix F we have 
that Px is Gaussian. 

Conversely, if Px is Gaussian then by Lemma 1 in Appendix F the probability measure Qx^ is 
Gaussian, and its corresponding moment generating function M^^ (t) must satisfy (34). Therefore, using 
(33) one obtains ctJ""^ [t) = ctJ^ Vt G (to - e, to + e)- □ 

C. Proof of Theorem 2: 

Since /i = A^t defines a bijective mapping from W to W it is sufficient to show that the set 

V = G R^^^ : Ag""^^ {fj,i, ^2) does not have distinct diagonal entries| (35) 

has zero Lebesgue measure. By the definition of Ag"^^ (/ij^, /I2) in Proposition 3, the set V can be written 



as 



= U ^J.fc' (36) 
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where 



T^ik= < /^i, G MP X M*^ : — = ' ' ' > , (37 



(tL^^ (/Uj j) = Sg"^^ (/^j) , and /ij j = [/^j],. Since at most one of the sources is Gaussian, then either 
Sj or 5*^ must be non-Gaussian. Let Sk denote the non-Gaussian source. By Theorem 1 we have that 
the exponential MT-variance <7^"^^ (/^i,fc) is not constant over any open interval. Thus, for almost every 
(/Ui 1, M2 7, Ml fc, fc) S for which the quotients in (37) are finite we have that ^ ^[s^(mi^^ 

Hence, the Lebesgue measure of Vj^k is zero for any j ^ k. Therefore, by relation (36) and the sub- 
additivity of Lebesgue's measure, the set V must have zero Lebesgue measure. □ 



as the moment generating function of X under the transformed 



D. Proof of Theorem 3: 

Define Afjf ^ (t) = E [exp (tX) ; Q^^ 
probability measure Q^-^^ associated with the Gaussian MT-function 

ff(X;to,r) =exp (^- ^^^"2°^ j • (38) 
Using (9), (12), (23) and (38) one can verify that 

4"«)(t + to,r) = T^ ^ • (39) 

If the condition in (24) is satisfied then by (39) and the properties of the moment generating function 

Mjf ) (t) = exp (^fi^^h + l4^t'^^ Vt G (-6, e) , (40) 
where /^^^ and cr^^ denote the mean and the variance of X under Qx\ respectively. Since the moment 



generating function, reduced to any open interval that contains the origin, uniquely determines the 
distribution [ [ k ] it is implied that Q^'' is a Gaussian measure. Hence, by Lemma 2 in Appendix F 
we have that Px is Gaussian. 

Conversely, if Px is Gaussian then by Lemma 2 in Appendix F the probability measure Qx^ is 
Gaussian, and its corresponding moment generating function M^^ (t) must satisfy (40). Therefore, using 
(39) one obtains ctJ**^^ (t, r) = a^^^ Vt E (to - e, to + e). □ 

E. Proof of Theorem 4: 

Since the relation = U^t defines a bijective mapping from MP to MP it is sufficient to show that 
the set 

V = |/^ G RP : Sg"'^^ (//, r) does not have distinct diagonal entries| (41) 
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has zero Lebesgue measure. Clearly, the set V can be written as 

p 

'D=[] Vj,k, (42) 



where 

P,, ^[fxeRP; ag^) (m„ r) = (^fc, r)} , (43) 



(tL"*^^ (/ij) = Sg"*^'' (/I, r) and /Xj = [fj].:- Since at most one of the sources is Gaussian, then either 
Sj or S'fc must be non-Gaussian. Let Sk denote the non-Gaussian source. By Theorem 3 the Gaussian MT- 
variance o"^"^^ {fJ-k^T) is not constant w.r.t. /i^ over any open interval. Thus, for almost every {fij,fik) S 
M?, for which o-^"°^ '^"'"'^ (/"i;''") finite values, we have that c^"'^'* {fJ.j,T) ^ '^'^sf^ (/^fc)''")- 

Hence, the Lebesgue measure of "Dj ^ is zero for any j ^ k. Therefore, by relation (42) and the sub- 
additivity of Lebesgue's measure, the set V must have zero Lebesgue measure. □ 



F. Some useful Lemmas 

The following Lemmas can be easily proved using (9)-(ll), and the definitions of the exponential and 
Gaussian MT-functions in (17) and (23), respectively. 

Lemma 1. A random vector X is Gaussian under the probability measure iff it is Gaussian under 
the transformed probability measure Qx*^^ with exponential MT-function. 

Lemma 2. A random vector X is Gaussian under the probability measure iff it is Gaussian under 
the transformed probability measure Q^'^^ with Gaussian MT-function. 

G. Choice of MT-function parameters 

1) Exponential MTICA: Assume that are independent samples from some continuous 

probability distribution. According to Theorem 2 if at most one of the sources is Gaussian, then for any 
pair {tm,tn), m ^ n, Assumption 3 in Proposition 3 is satisfied with probability 1 that leads to unique 
identification of A based on the corresponding MT-covariance matrices Xl^^^ and 5]^^^ (t„). 

Motivated by this result we propose the following procedure that randomly generates test-points inside 
a unit /2-ball: 

1) Generate M i.i.d samples S 1^^, m = 1,...,M such that the components of each are 
statistically independent with uniform distribution on [—1,1]. 

2) Generate M i.i.d. samples Cm rn = 1, . . . , M with uniform distribution on [0, 1]. 



February 5, 2013 



DRAFT 



SUBMITTED TO THE IEEE TRANSACTIONS ON SIGNAL PROCESSING 



28 



3) Obtain the sequence of test-points: 

tm = Cni n ^""n , m = 1, . . . ,M. 
1 1 ^rn 1 1 2 

2) Gaussian MTICA: Assume that ti, . . . ,tM are independent samples from some continuous prob- 
ability distribution. According to Theorem 4 if at most one of the sources is Gaussian, then for any 
m = 1, . . . , M the Gaussian MT-covariance S^"'^-' {tm, r) in (25) has distinct eigenvalues with probability 
1 that leads to unique identification of the mixing matrix A. 

Motivated by this result, and applying the fact that the data is centered and whitened, we propose 
to generate M i.i.d. vectors tm, m = 1, . . . , M, such that the components of each tm are statistically 
independent with zero mean and unit variance. In all considered examples we used the beta distribution 
with identical shape parameters a = (3 = 3. 
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